Python Job: Site Reliability Engineer - Portugal (Full-Remote)

Job added on

Company

Boost-IT
Portugal

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Python Job Details

Boost IT is a Portuguese technology consultancy company, we are integrated into one of the most entrepreneurial groups in Portugal, with investment in more than 30 companies.

We want to be known for being the most dynamic, energetic and reliable company to operate in the market and, for that, we want to count on you.

If you're passionate about technology and want to work on the most relevant technology projects, then this ad could be for you!

Boost IT. Doing IT. Better

Tasks

The Live Engineering team manages the production reliability of the company's Platform. As part of this team, you will spread reliability love and best practices throughout all of the Platform verticals, and partner with other engineers to build the necessary tools and automation to help us always improve our availability, uptime, responsiveness, and much more.

  • Work on the production availability of the company's platform.
  • Coach teams on how to implement our reliability framework
  • Spread reliability love and best practices across the organization, by being consulting partner for all areas on topics related to reliability
  • Improve incident response processes and lead sustainable and efficient blameless post-mortem and production improvements that can directly affect the company's ecosystem and partners
  • Partner with other teams to collaborate on main components and libs with code and/or design that helps to improve our overall production environments availability and/or teams' productivity
  • Engage with teams on projects regarding reliability for fairly brief and constrained projects
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Requirements

  • Graduate in Computer Science or another technical discipline, or related practical experience.
  • Experienced with algorithms, data structures, complexity analysis, and software design.
  • Experienced programming in at least one of the following languages: C#, Java, or Python.
  • Experienced in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Experience with 1st level and 2nd level incident mitigation
  • Able to debug and improve code.
  • Familiarized with best practices about Software Engineering.
  • Experienced in identifying and addressing toil.
  • Able to understand multiple approaches regarding monitoring and what best fits each context.
  • A systematic problem-solver with strong communication skills and a sense of ownership and drive.